Overview

Dataset statistics

Number of variables21
Number of observations55424
Missing cells0
Missing cells (%)0.0%
Duplicate rows51
Duplicate rows (%)0.1%
Total size in memory47.9 MiB
Average record size in memory906.2 B

Variable types

Categorical12
Numeric6
Boolean3

Alerts

Dataset has 51 (0.1%) duplicate rowsDuplicates
vehicle_age is highly overall correlated with insurance_premiumHigh correlation
insurance_premium is highly overall correlated with vehicle_ageHigh correlation
direction is highly overall correlated with intersectionHigh correlation
intersection is highly overall correlated with direction and 1 other fieldsHigh correlation
weather_1 is highly overall correlated with road_surfaceHigh correlation
primary_collision_factor is highly overall correlated with pcf_violation_categoryHigh correlation
pcf_violation_category is highly overall correlated with intersection and 1 other fieldsHigh correlation
road_surface is highly overall correlated with weather_1High correlation
party_sobriety is highly overall correlated with party_drug_physicalHigh correlation
party_drug_physical is highly overall correlated with party_sobrietyHigh correlation
weather_1 is highly imbalanced (69.0%)Imbalance
primary_collision_factor is highly imbalanced (97.7%)Imbalance
road_surface is highly imbalanced (74.5%)Imbalance
road_condition_1 is highly imbalanced (90.9%)Imbalance
party_sobriety is highly imbalanced (70.1%)Imbalance
party_drug_physical is highly imbalanced (83.2%)Imbalance
cellphone_in_use is highly imbalanced (86.7%)Imbalance
vehicle_age has 2985 (5.4%) zerosZeros
distance has 13334 (24.1%) zerosZeros
collision_time has 1132 (2.0%) zerosZeros
insurance_premium has 1129 (2.0%) zerosZeros

Reproduction

Analysis started2023-11-24 07:57:58.031851
Analysis finished2023-11-24 07:58:13.247225
Duration15.22 seconds
Software versionydata-profiling vv4.6.1
Download configurationconfig.json

Variables

vehicle_type
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
sedan
34837 
coupe
18063 
hatchback
 
1579
minivan
 
909
other
 
36

Length

Max length9
Median length5
Mean length5.1467595
Min length5

Characters and Unicode

Total characters285254
Distinct characters17
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowsedan
2nd rowsedan
3rd rowsedan
4th rowsedan
5th rowcoupe

Common Values

ValueCountFrequency (%)
sedan 34837
62.9%
coupe 18063
32.6%
hatchback 1579
 
2.8%
minivan 909
 
1.6%
other 36
 
0.1%

Length

2023-11-24T10:58:13.292947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T10:58:13.348377image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
sedan 34837
62.9%
coupe 18063
32.6%
hatchback 1579
 
2.8%
minivan 909
 
1.6%
other 36
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e 52936
18.6%
a 38904
13.6%
n 36655
12.8%
s 34837
12.2%
d 34837
12.2%
c 21221
7.4%
o 18099
 
6.3%
p 18063
 
6.3%
u 18063
 
6.3%
h 3194
 
1.1%
Other values (7) 8445
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 285254
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 52936
18.6%
a 38904
13.6%
n 36655
12.8%
s 34837
12.2%
d 34837
12.2%
c 21221
7.4%
o 18099
 
6.3%
p 18063
 
6.3%
u 18063
 
6.3%
h 3194
 
1.1%
Other values (7) 8445
 
3.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 285254
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 52936
18.6%
a 38904
13.6%
n 36655
12.8%
s 34837
12.2%
d 34837
12.2%
c 21221
7.4%
o 18099
 
6.3%
p 18063
 
6.3%
u 18063
 
6.3%
h 3194
 
1.1%
Other values (7) 8445
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 285254
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 52936
18.6%
a 38904
13.6%
n 36655
12.8%
s 34837
12.2%
d 34837
12.2%
c 21221
7.4%
o 18099
 
6.3%
p 18063
 
6.3%
u 18063
 
6.3%
h 3194
 
1.1%
Other values (7) 8445
 
3.0%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
manual
29260 
auto
25311 
not applicable
 
853

Length

Max length14
Median length6
Mean length5.2097647
Min length4

Characters and Unicode

Total characters288746
Distinct characters13
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmanual
2nd rowmanual
3rd rowauto
4th rowauto
5th rowmanual

Common Values

ValueCountFrequency (%)
manual 29260
52.8%
auto 25311
45.7%
not applicable 853
 
1.5%

Length

2023-11-24T10:58:13.399929image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T10:58:13.441666image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
manual 29260
52.0%
auto 25311
45.0%
not 853
 
1.5%
applicable 853
 
1.5%

Most occurring characters

ValueCountFrequency (%)
a 85537
29.6%
u 54571
18.9%
l 30966
 
10.7%
n 30113
 
10.4%
m 29260
 
10.1%
t 26164
 
9.1%
o 26164
 
9.1%
p 1706
 
0.6%
853
 
0.3%
i 853
 
0.3%
Other values (3) 2559
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 287893
99.7%
Space Separator 853
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 85537
29.7%
u 54571
19.0%
l 30966
 
10.8%
n 30113
 
10.5%
m 29260
 
10.2%
t 26164
 
9.1%
o 26164
 
9.1%
p 1706
 
0.6%
i 853
 
0.3%
c 853
 
0.3%
Other values (2) 1706
 
0.6%
Space Separator
ValueCountFrequency (%)
853
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 287893
99.7%
Common 853
 
0.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 85537
29.7%
u 54571
19.0%
l 30966
 
10.8%
n 30113
 
10.5%
m 29260
 
10.2%
t 26164
 
9.1%
o 26164
 
9.1%
p 1706
 
0.6%
i 853
 
0.3%
c 853
 
0.3%
Other values (2) 1706
 
0.6%
Common
ValueCountFrequency (%)
853
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 288746
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 85537
29.6%
u 54571
18.9%
l 30966
 
10.7%
n 30113
 
10.4%
m 29260
 
10.1%
t 26164
 
9.1%
o 26164
 
9.1%
p 1706
 
0.6%
853
 
0.3%
i 853
 
0.3%
Other values (3) 2559
 
0.9%

vehicle_age
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.8123557
Minimum0
Maximum19
Zeros2985
Zeros (%)5.4%
Negative0
Negative (%)0.0%
Memory size866.0 KiB
2023-11-24T10:58:13.483774image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13
median4
Q37
95-th percentile11
Maximum19
Range19
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.0667019
Coefficient of variation (CV)0.63725587
Kurtosis0.018547115
Mean4.8123557
Median Absolute Deviation (MAD)2
Skewness0.73735524
Sum266720
Variance9.4046605
MonotonicityNot monotonic
2023-11-24T10:58:13.538883image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
3 12434
22.4%
4 7044
12.7%
2 6072
11.0%
5 5441
9.8%
6 3913
 
7.1%
7 3806
 
6.9%
8 3484
 
6.3%
0 2985
 
5.4%
9 2768
 
5.0%
1 2437
 
4.4%
Other values (9) 5040
9.1%
ValueCountFrequency (%)
0 2985
 
5.4%
1 2437
 
4.4%
2 6072
11.0%
3 12434
22.4%
4 7044
12.7%
5 5441
9.8%
6 3913
 
7.1%
7 3806
 
6.9%
8 3484
 
6.3%
9 2768
 
5.0%
ValueCountFrequency (%)
19 1
 
< 0.1%
17 3
 
< 0.1%
16 7
 
< 0.1%
15 41
 
0.1%
14 283
 
0.5%
13 557
 
1.0%
12 861
 
1.6%
11 1350
2.4%
10 1937
3.5%
9 2768
5.0%

county_city_location
Real number (ℝ)

Distinct498
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2836.0159
Minimum100
Maximum5802
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size866.0 KiB
2023-11-24T10:58:13.601016image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum100
5-th percentile702
Q11942
median3008
Q33701
95-th percentile5200
Maximum5802
Range5702
Interquartile range (IQR)1759

Descriptive statistics

Standard deviation1296.6935
Coefficient of variation (CV)0.45722364
Kurtosis-0.4186666
Mean2836.0159
Median Absolute Deviation (MAD)1066
Skewness0.10222305
Sum1.5718334 × 108
Variance1681414
MonotonicityNot monotonic
2023-11-24T10:58:13.664533image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1942 7086
 
12.8%
1900 2179
 
3.9%
3711 1352
 
2.4%
3400 1051
 
1.9%
1500 923
 
1.7%
3700 918
 
1.7%
3300 909
 
1.6%
3600 903
 
1.6%
4313 879
 
1.6%
3001 874
 
1.6%
Other values (488) 38350
69.2%
ValueCountFrequency (%)
100 355
0.6%
101 58
 
0.1%
102 16
 
< 0.1%
103 128
 
0.2%
104 68
 
0.1%
105 226
0.4%
106 153
 
0.3%
107 94
 
0.2%
108 38
 
0.1%
109 496
0.9%
ValueCountFrequency (%)
5802 4
 
< 0.1%
5801 25
 
< 0.1%
5800 81
0.1%
5704 82
0.1%
5703 42
 
0.1%
5702 3
 
< 0.1%
5701 49
0.1%
5700 107
0.2%
5690 44
 
0.1%
5609 120
0.2%

distance
Real number (ℝ)

ZEROS 

Distinct1773
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean688.58936
Minimum0
Maximum19536
Zeros13334
Zeros (%)24.1%
Negative0
Negative (%)0.0%
Memory size866.0 KiB
2023-11-24T10:58:13.729847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q15
median120
Q3528
95-th percentile3100
Maximum19536
Range19536
Interquartile range (IQR)523

Descriptive statistics

Standard deviation1699.367
Coefficient of variation (CV)2.4678962
Kurtosis37.109808
Mean688.58936
Median Absolute Deviation (MAD)120
Skewness5.3749897
Sum38164377
Variance2887848.3
MonotonicityNot monotonic
2023-11-24T10:58:13.798793image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 13334
24.1%
100 1743
 
3.1%
200 1509
 
2.7%
1056 1380
 
2.5%
528 1377
 
2.5%
300 1298
 
2.3%
500 1227
 
2.2%
50 1223
 
2.2%
1584 1051
 
1.9%
2640 1001
 
1.8%
Other values (1763) 30281
54.6%
ValueCountFrequency (%)
0 13334
24.1%
1 75
 
0.1%
1.1 3
 
< 0.1%
1.17 2
 
< 0.1%
1.2 2
 
< 0.1%
1.3 2
 
< 0.1%
1.33 1
 
< 0.1%
1.4 3
 
< 0.1%
1.5 3
 
< 0.1%
1.8 2
 
< 0.1%
ValueCountFrequency (%)
19536 10
< 0.1%
19325 1
 
< 0.1%
19219 1
 
< 0.1%
19008 8
< 0.1%
18850 1
 
< 0.1%
18797 1
 
< 0.1%
18480 17
< 0.1%
18322 1
 
< 0.1%
18216 2
 
< 0.1%
17952 6
 
< 0.1%

direction
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
unknown
13134 
north
11757 
south
11616 
west
9571 
east
9346 

Length

Max length7
Median length5
Mean length5.1326321
Min length4

Characters and Unicode

Total characters284471
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st roweast
2nd rowsouth
3rd rowunknown
4th roweast
5th rowsouth

Common Values

ValueCountFrequency (%)
unknown 13134
23.7%
north 11757
21.2%
south 11616
21.0%
west 9571
17.3%
east 9346
16.9%

Length

2023-11-24T10:58:13.857000image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T10:58:13.903446image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
unknown 13134
23.7%
north 11757
21.2%
south 11616
21.0%
west 9571
17.3%
east 9346
16.9%

Most occurring characters

ValueCountFrequency (%)
n 51159
18.0%
t 42290
14.9%
o 36507
12.8%
s 30533
10.7%
u 24750
8.7%
h 23373
8.2%
w 22705
8.0%
e 18917
 
6.6%
k 13134
 
4.6%
r 11757
 
4.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 284471
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 51159
18.0%
t 42290
14.9%
o 36507
12.8%
s 30533
10.7%
u 24750
8.7%
h 23373
8.2%
w 22705
8.0%
e 18917
 
6.6%
k 13134
 
4.6%
r 11757
 
4.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 284471
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 51159
18.0%
t 42290
14.9%
o 36507
12.8%
s 30533
10.7%
u 24750
8.7%
h 23373
8.2%
w 22705
8.0%
e 18917
 
6.6%
k 13134
 
4.6%
r 11757
 
4.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 284471
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 51159
18.0%
t 42290
14.9%
o 36507
12.8%
s 30533
10.7%
u 24750
8.7%
h 23373
8.2%
w 22705
8.0%
e 18917
 
6.6%
k 13134
 
4.6%
r 11757
 
4.1%

intersection
Boolean

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size487.1 KiB
False
42902 
True
12522 
ValueCountFrequency (%)
False 42902
77.4%
True 12522
 
22.6%
2023-11-24T10:58:13.947222image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

weather_1
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
clear
44460 
cloudy
8277 
raining
 
2162
fog
 
187
unknown
 
151
Other values (3)
 
187

Length

Max length7
Median length5
Mean length5.2311093
Min length3

Characters and Unicode

Total characters289929
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowclear
2nd rowclear
3rd rowclear
4th rowclear
5th rowclear

Common Values

ValueCountFrequency (%)
clear 44460
80.2%
cloudy 8277
 
14.9%
raining 2162
 
3.9%
fog 187
 
0.3%
unknown 151
 
0.3%
snowing 145
 
0.3%
other 32
 
0.1%
wind 10
 
< 0.1%

Length

2023-11-24T10:58:13.998528image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T10:58:14.051281image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
clear 44460
80.2%
cloudy 8277
 
14.9%
raining 2162
 
3.9%
fog 187
 
0.3%
unknown 151
 
0.3%
snowing 145
 
0.3%
other 32
 
0.1%
wind 10
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
c 52737
18.2%
l 52737
18.2%
r 46654
16.1%
a 46622
16.1%
e 44492
15.3%
o 8792
 
3.0%
u 8428
 
2.9%
d 8287
 
2.9%
y 8277
 
2.9%
n 5077
 
1.8%
Other values (8) 7826
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 289929
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
c 52737
18.2%
l 52737
18.2%
r 46654
16.1%
a 46622
16.1%
e 44492
15.3%
o 8792
 
3.0%
u 8428
 
2.9%
d 8287
 
2.9%
y 8277
 
2.9%
n 5077
 
1.8%
Other values (8) 7826
 
2.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 289929
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
c 52737
18.2%
l 52737
18.2%
r 46654
16.1%
a 46622
16.1%
e 44492
15.3%
o 8792
 
3.0%
u 8428
 
2.9%
d 8287
 
2.9%
y 8277
 
2.9%
n 5077
 
1.8%
Other values (8) 7826
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 289929
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
c 52737
18.2%
l 52737
18.2%
r 46654
16.1%
a 46622
16.1%
e 44492
15.3%
o 8792
 
3.0%
u 8428
 
2.9%
d 8287
 
2.9%
y 8277
 
2.9%
n 5077
 
1.8%
Other values (8) 7826
 
2.7%

location_type
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
road
30751 
highway
20209 
ramp
3150 
intersection
 
1314

Length

Max length12
Median length4
Mean length5.2835414
Min length4

Characters and Unicode

Total characters292835
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowintersection
2nd rowhighway
3rd rowroad
4th rowroad
5th rowhighway

Common Values

ValueCountFrequency (%)
road 30751
55.5%
highway 20209
36.5%
ramp 3150
 
5.7%
intersection 1314
 
2.4%

Length

2023-11-24T10:58:14.114335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T10:58:14.163322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
road 30751
55.5%
highway 20209
36.5%
ramp 3150
 
5.7%
intersection 1314
 
2.4%

Most occurring characters

ValueCountFrequency (%)
a 54110
18.5%
h 40418
13.8%
r 35215
12.0%
o 32065
10.9%
d 30751
10.5%
i 22837
7.8%
g 20209
 
6.9%
w 20209
 
6.9%
y 20209
 
6.9%
m 3150
 
1.1%
Other values (6) 13662
 
4.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 292835
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 54110
18.5%
h 40418
13.8%
r 35215
12.0%
o 32065
10.9%
d 30751
10.5%
i 22837
7.8%
g 20209
 
6.9%
w 20209
 
6.9%
y 20209
 
6.9%
m 3150
 
1.1%
Other values (6) 13662
 
4.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 292835
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 54110
18.5%
h 40418
13.8%
r 35215
12.0%
o 32065
10.9%
d 30751
10.5%
i 22837
7.8%
g 20209
 
6.9%
w 20209
 
6.9%
y 20209
 
6.9%
m 3150
 
1.1%
Other values (6) 13662
 
4.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 292835
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 54110
18.5%
h 40418
13.8%
r 35215
12.0%
o 32065
10.9%
d 30751
10.5%
i 22837
7.8%
g 20209
 
6.9%
w 20209
 
6.9%
y 20209
 
6.9%
m 3150
 
1.1%
Other values (6) 13662
 
4.7%

primary_collision_factor
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.6 MiB
vehicle code violation
55214 
other improper driving
 
209
fell asleep
 
1

Length

Max length22
Median length22
Mean length21.999802
Min length11

Characters and Unicode

Total characters1219317
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowvehicle code violation
2nd rowvehicle code violation
3rd rowvehicle code violation
4th rowvehicle code violation
5th rowvehicle code violation

Common Values

ValueCountFrequency (%)
vehicle code violation 55214
99.6%
other improper driving 209
 
0.4%
fell asleep 1
 
< 0.1%

Length

2023-11-24T10:58:14.218265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T10:58:14.263248image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
vehicle 55214
33.2%
code 55214
33.2%
violation 55214
33.2%
other 209
 
0.1%
improper 209
 
0.1%
driving 209
 
0.1%
fell 1
 
< 0.1%
asleep 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
i 166269
13.6%
e 166063
13.6%
o 166060
13.6%
110847
9.1%
v 110637
9.1%
l 110431
9.1%
c 110428
9.1%
n 55423
 
4.5%
h 55423
 
4.5%
d 55423
 
4.5%
Other values (8) 112313
9.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1108470
90.9%
Space Separator 110847
 
9.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 166269
15.0%
e 166063
15.0%
o 166060
15.0%
v 110637
10.0%
l 110431
10.0%
c 110428
10.0%
n 55423
 
5.0%
h 55423
 
5.0%
d 55423
 
5.0%
t 55423
 
5.0%
Other values (7) 56890
 
5.1%
Space Separator
ValueCountFrequency (%)
110847
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1108470
90.9%
Common 110847
 
9.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 166269
15.0%
e 166063
15.0%
o 166060
15.0%
v 110637
10.0%
l 110431
10.0%
c 110428
10.0%
n 55423
 
5.0%
h 55423
 
5.0%
d 55423
 
5.0%
t 55423
 
5.0%
Other values (7) 56890
 
5.1%
Common
ValueCountFrequency (%)
110847
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1219317
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 166269
13.6%
e 166063
13.6%
o 166060
13.6%
110847
9.1%
v 110637
9.1%
l 110431
9.1%
c 110428
9.1%
n 55423
 
4.5%
h 55423
 
4.5%
d 55423
 
4.5%
Other values (8) 112313
9.2%

pcf_violation_category
Categorical

HIGH CORRELATION 

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
speeding
18477 
improper turning
8507 
automobile right of way
7397 
dui
6339 
unsafe lane change
4431 
Other values (16)
10273 

Length

Max length26
Median length23
Mean length14.001696
Min length3

Characters and Unicode

Total characters776030
Distinct characters25
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowautomobile right of way
2nd rowspeeding
3rd rowautomobile right of way
4th rowautomobile right of way
5th rowimproper turning

Common Values

ValueCountFrequency (%)
speeding 18477
33.3%
improper turning 8507
15.3%
automobile right of way 7397
13.3%
dui 6339
 
11.4%
unsafe lane change 4431
 
8.0%
traffic signals and signs 3217
 
5.8%
unsafe starting or backing 1542
 
2.8%
wrong side of road 1296
 
2.3%
following too closely 1012
 
1.8%
pedestrian right of way 953
 
1.7%
Other values (11) 2253
 
4.1%

Length

2023-11-24T10:58:14.318552image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
speeding 18477
15.4%
of 9646
 
8.0%
improper 9255
 
7.7%
turning 8507
 
7.1%
right 8350
 
6.9%
way 8350
 
6.9%
automobile 7397
 
6.1%
dui 6339
 
5.3%
unsafe 5973
 
5.0%
change 4431
 
3.7%
Other values (28) 33633
27.9%

Most occurring characters

ValueCountFrequency (%)
i 76799
 
9.9%
e 73845
 
9.5%
n 70026
 
9.0%
64934
 
8.4%
g 52407
 
6.8%
a 48092
 
6.2%
r 47043
 
6.1%
o 46356
 
6.0%
s 43873
 
5.7%
p 38678
 
5.0%
Other values (15) 213977
27.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 711096
91.6%
Space Separator 64934
 
8.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 76799
10.8%
e 73845
10.4%
n 70026
 
9.8%
g 52407
 
7.4%
a 48092
 
6.8%
r 47043
 
6.6%
o 46356
 
6.5%
s 43873
 
6.2%
p 38678
 
5.4%
t 34195
 
4.8%
Other values (14) 179782
25.3%
Space Separator
ValueCountFrequency (%)
64934
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 711096
91.6%
Common 64934
 
8.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 76799
10.8%
e 73845
10.4%
n 70026
 
9.8%
g 52407
 
7.4%
a 48092
 
6.8%
r 47043
 
6.6%
o 46356
 
6.5%
s 43873
 
6.2%
p 38678
 
5.4%
t 34195
 
4.8%
Other values (14) 179782
25.3%
Common
ValueCountFrequency (%)
64934
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 776030
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 76799
 
9.9%
e 73845
 
9.5%
n 70026
 
9.0%
64934
 
8.4%
g 52407
 
6.8%
a 48092
 
6.2%
r 47043
 
6.1%
o 46356
 
6.0%
s 43873
 
5.7%
p 38678
 
5.0%
Other values (15) 213977
27.6%

road_surface
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.6 MiB
dry
49858 
wet
5191 
snowy
 
340
slippery
 
35

Length

Max length8
Median length3
Mean length3.0154265
Min length3

Characters and Unicode

Total characters167127
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowdry
2nd rowdry
3rd rowdry
4th rowdry
5th rowdry

Common Values

ValueCountFrequency (%)
dry 49858
90.0%
wet 5191
 
9.4%
snowy 340
 
0.6%
slippery 35
 
0.1%

Length

2023-11-24T10:58:14.372810image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T10:58:14.418844image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
dry 49858
90.0%
wet 5191
 
9.4%
snowy 340
 
0.6%
slippery 35
 
0.1%

Most occurring characters

ValueCountFrequency (%)
y 50233
30.1%
r 49893
29.9%
d 49858
29.8%
w 5531
 
3.3%
e 5226
 
3.1%
t 5191
 
3.1%
s 375
 
0.2%
n 340
 
0.2%
o 340
 
0.2%
p 70
 
< 0.1%
Other values (2) 70
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 167127
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
y 50233
30.1%
r 49893
29.9%
d 49858
29.8%
w 5531
 
3.3%
e 5226
 
3.1%
t 5191
 
3.1%
s 375
 
0.2%
n 340
 
0.2%
o 340
 
0.2%
p 70
 
< 0.1%
Other values (2) 70
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 167127
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
y 50233
30.1%
r 49893
29.9%
d 49858
29.8%
w 5531
 
3.3%
e 5226
 
3.1%
t 5191
 
3.1%
s 375
 
0.2%
n 340
 
0.2%
o 340
 
0.2%
p 70
 
< 0.1%
Other values (2) 70
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 167127
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
y 50233
30.1%
r 49893
29.9%
d 49858
29.8%
w 5531
 
3.3%
e 5226
 
3.1%
t 5191
 
3.1%
s 375
 
0.2%
n 340
 
0.2%
o 340
 
0.2%
p 70
 
< 0.1%
Other values (2) 70
 
< 0.1%

road_condition_1
Categorical

IMBALANCE 

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.8 MiB
normal
53644 
construction
 
950
obstruction
 
209
holes
 
206
other
 
205
Other values (3)
 
210

Length

Max length14
Median length6
Mean length6.1389651
Min length5

Characters and Unicode

Total characters340246
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rownormal
2nd rowconstruction
3rd rownormal
4th rownormal
5th rownormal

Common Values

ValueCountFrequency (%)
normal 53644
96.8%
construction 950
 
1.7%
obstruction 209
 
0.4%
holes 206
 
0.4%
other 205
 
0.4%
loose material 108
 
0.2%
reduced width 67
 
0.1%
flooded 35
 
0.1%

Length

2023-11-24T10:58:14.470181image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T10:58:14.520857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
normal 53644
96.5%
construction 950
 
1.7%
obstruction 209
 
0.4%
holes 206
 
0.4%
other 205
 
0.4%
loose 108
 
0.2%
material 108
 
0.2%
reduced 67
 
0.1%
width 67
 
0.1%
flooded 35
 
0.1%

Most occurring characters

ValueCountFrequency (%)
o 56659
16.7%
n 55753
16.4%
r 55183
16.2%
l 54101
15.9%
a 53860
15.8%
m 53752
15.8%
t 2698
 
0.8%
c 2176
 
0.6%
s 1473
 
0.4%
i 1334
 
0.4%
Other values (8) 3257
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 340071
99.9%
Space Separator 175
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 56659
16.7%
n 55753
16.4%
r 55183
16.2%
l 54101
15.9%
a 53860
15.8%
m 53752
15.8%
t 2698
 
0.8%
c 2176
 
0.6%
s 1473
 
0.4%
i 1334
 
0.4%
Other values (7) 3082
 
0.9%
Space Separator
ValueCountFrequency (%)
175
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 340071
99.9%
Common 175
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 56659
16.7%
n 55753
16.4%
r 55183
16.2%
l 54101
15.9%
a 53860
15.8%
m 53752
15.8%
t 2698
 
0.8%
c 2176
 
0.6%
s 1473
 
0.4%
i 1334
 
0.4%
Other values (7) 3082
 
0.9%
Common
ValueCountFrequency (%)
175
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 340246
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 56659
16.7%
n 55753
16.4%
r 55183
16.2%
l 54101
15.9%
a 53860
15.8%
m 53752
15.8%
t 2698
 
0.8%
c 2176
 
0.6%
s 1473
 
0.4%
i 1334
 
0.4%
Other values (8) 3257
 
1.0%

lighting
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.1 MiB
daylight
37733 
dark with street lights
10996 
dark with no street lights
4830 
dusk or dawn
 
1731
dark with street lights not functioning
 
134

Length

Max length39
Median length8
Mean length12.744479
Min length8

Characters and Unicode

Total characters706350
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowdaylight
2nd rowdark with street lights
3rd rowdaylight
4th rowdusk or dawn
5th rowdusk or dawn

Common Values

ValueCountFrequency (%)
daylight 37733
68.1%
dark with street lights 10996
 
19.8%
dark with no street lights 4830
 
8.7%
dusk or dawn 1731
 
3.1%
dark with street lights not functioning 134
 
0.2%

Length

2023-11-24T10:58:14.581570image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T10:58:14.630500image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
daylight 37733
33.7%
dark 15960
14.3%
with 15960
14.3%
street 15960
14.3%
lights 15960
14.3%
no 4830
 
4.3%
dusk 1731
 
1.5%
or 1731
 
1.5%
dawn 1731
 
1.5%
not 134
 
0.1%

Most occurring characters

ValueCountFrequency (%)
t 101841
14.4%
i 69921
9.9%
h 69653
9.9%
d 57155
8.1%
56440
8.0%
a 55424
7.8%
g 53827
7.6%
l 53693
7.6%
y 37733
 
5.3%
s 33651
 
4.8%
Other values (9) 117012
16.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 649910
92.0%
Space Separator 56440
 
8.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 101841
15.7%
i 69921
10.8%
h 69653
10.7%
d 57155
8.8%
a 55424
8.5%
g 53827
8.3%
l 53693
8.3%
y 37733
 
5.8%
s 33651
 
5.2%
r 33651
 
5.2%
Other values (8) 83361
12.8%
Space Separator
ValueCountFrequency (%)
56440
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 649910
92.0%
Common 56440
 
8.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 101841
15.7%
i 69921
10.8%
h 69653
10.7%
d 57155
8.8%
a 55424
8.5%
g 53827
8.3%
l 53693
8.3%
y 37733
 
5.8%
s 33651
 
5.2%
r 33651
 
5.2%
Other values (8) 83361
12.8%
Common
ValueCountFrequency (%)
56440
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 706350
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 101841
14.4%
i 69921
9.9%
h 69653
9.9%
d 57155
8.1%
56440
8.0%
a 55424
7.8%
g 53827
7.6%
l 53693
7.6%
y 37733
 
5.3%
s 33651
 
4.8%
Other values (9) 117012
16.6%

collision_time
Real number (ℝ)

ZEROS 

Distinct24
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.278995
Minimum0
Maximum23
Zeros1132
Zeros (%)2.0%
Negative0
Negative (%)0.0%
Memory size866.0 KiB
2023-11-24T10:58:14.685317image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q110
median14
Q317
95-th percentile22
Maximum23
Range23
Interquartile range (IQR)7

Descriptive statistics

Standard deviation5.545515
Coefficient of variation (CV)0.41761557
Kurtosis-0.30864422
Mean13.278995
Median Absolute Deviation (MAD)4
Skewness-0.49874449
Sum735975
Variance30.752736
MonotonicityNot monotonic
2023-11-24T10:58:14.738244image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
15 4814
 
8.7%
17 4221
 
7.6%
16 4082
 
7.4%
14 4028
 
7.3%
18 3663
 
6.6%
13 3490
 
6.3%
12 3276
 
5.9%
11 2790
 
5.0%
7 2598
 
4.7%
19 2523
 
4.6%
Other values (14) 19939
36.0%
ValueCountFrequency (%)
0 1132
2.0%
1 1110
2.0%
2 1157
2.1%
3 708
 
1.3%
4 472
 
0.9%
5 680
 
1.2%
6 1118
2.0%
7 2598
4.7%
8 2505
4.5%
9 2014
3.6%
ValueCountFrequency (%)
23 1337
 
2.4%
22 1528
 
2.8%
21 1848
 
3.3%
20 2069
3.7%
19 2523
4.6%
18 3663
6.6%
17 4221
7.6%
16 4082
7.4%
15 4814
8.7%
14 4028
7.3%

at_fault
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size487.1 KiB
True
28211 
False
27213 
ValueCountFrequency (%)
True 28211
50.9%
False 27213
49.1%
2023-11-24T10:58:14.782430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

insurance_premium
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct103
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.587994
Minimum0
Maximum102
Zeros1129
Zeros (%)2.0%
Negative0
Negative (%)0.0%
Memory size866.0 KiB
2023-11-24T10:58:14.831916image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile17
Q123
median33
Q348
95-th percentile68
Maximum102
Range102
Interquartile range (IQR)25

Descriptive statistics

Standard deviation17.140496
Coefficient of variation (CV)0.46847323
Kurtosis0.042619936
Mean36.587994
Median Absolute Deviation (MAD)12
Skewness0.60946182
Sum2027853
Variance293.7966
MonotonicityNot monotonic
2023-11-24T10:58:14.891517image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
19 2108
 
3.8%
21 2069
 
3.7%
20 2047
 
3.7%
22 1891
 
3.4%
18 1809
 
3.3%
23 1759
 
3.2%
24 1652
 
3.0%
25 1515
 
2.7%
26 1477
 
2.7%
27 1349
 
2.4%
Other values (93) 37748
68.1%
ValueCountFrequency (%)
0 1129
2.0%
1 7
 
< 0.1%
2 12
 
< 0.1%
3 9
 
< 0.1%
4 13
 
< 0.1%
5 17
 
< 0.1%
6 11
 
< 0.1%
7 21
 
< 0.1%
8 12
 
< 0.1%
9 22
 
< 0.1%
ValueCountFrequency (%)
102 1
 
< 0.1%
101 1
 
< 0.1%
100 1
 
< 0.1%
99 3
 
< 0.1%
98 2
 
< 0.1%
97 3
 
< 0.1%
96 5
 
< 0.1%
95 5
 
< 0.1%
94 3
 
< 0.1%
93 13
< 0.1%

party_sobriety
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.6 MiB
had not been drinking
48205 
had been drinking, under influence
 
4458
impairment unknown
 
1239
not applicable
 
677
had been drinking, not under influence
 
569

Length

Max length38
Median length21
Mean length22.147283
Min length14

Characters and Unicode

Total characters1227491
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowhad not been drinking
2nd rowhad been drinking, under influence
3rd rowhad not been drinking
4th rowhad not been drinking
5th rowimpairment unknown

Common Values

ValueCountFrequency (%)
had not been drinking 48205
87.0%
had been drinking, under influence 4458
 
8.0%
impairment unknown 1239
 
2.2%
not applicable 677
 
1.2%
had been drinking, not under influence 569
 
1.0%
had been drinking, impairment unknown 276
 
0.5%

Length

2023-11-24T10:58:14.944856image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T10:58:14.991256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
had 53508
23.9%
been 53508
23.9%
drinking 53508
23.9%
not 49451
22.1%
under 5027
 
2.2%
influence 5027
 
2.2%
impairment 1515
 
0.7%
unknown 1515
 
0.7%
applicable 677
 
0.3%

Most occurring characters

ValueCountFrequency (%)
n 231116
18.8%
168312
13.7%
e 124289
10.1%
i 115750
9.4%
d 112043
9.1%
r 60050
 
4.9%
a 56377
 
4.6%
k 55023
 
4.5%
b 54185
 
4.4%
g 53508
 
4.4%
Other values (11) 196838
16.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1053876
85.9%
Space Separator 168312
 
13.7%
Other Punctuation 5303
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 231116
21.9%
e 124289
11.8%
i 115750
11.0%
d 112043
10.6%
r 60050
 
5.7%
a 56377
 
5.3%
k 55023
 
5.2%
b 54185
 
5.1%
g 53508
 
5.1%
h 53508
 
5.1%
Other values (9) 138027
13.1%
Space Separator
ValueCountFrequency (%)
168312
100.0%
Other Punctuation
ValueCountFrequency (%)
, 5303
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1053876
85.9%
Common 173615
 
14.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 231116
21.9%
e 124289
11.8%
i 115750
11.0%
d 112043
10.6%
r 60050
 
5.7%
a 56377
 
5.3%
k 55023
 
5.2%
b 54185
 
5.1%
g 53508
 
5.1%
h 53508
 
5.1%
Other values (9) 138027
13.1%
Common
ValueCountFrequency (%)
168312
96.9%
, 5303
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1227491
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 231116
18.8%
168312
13.7%
e 124289
10.1%
i 115750
9.4%
d 112043
9.1%
r 60050
 
4.9%
a 56377
 
4.6%
k 55023
 
4.5%
b 54185
 
4.4%
g 53508
 
4.4%
Other values (11) 196838
16.0%

party_drug_physical
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.9 MiB
no drugs
52218 
G
 
1239
under drug influence
 
864
not applicable
 
677
sleepy/fatigued
 
369

Length

Max length21
Median length8
Mean length8.163846
Min length1

Characters and Unicode

Total characters452473
Distinct characters23
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowno drugs
2nd rowno drugs
3rd rowno drugs
4th rowno drugs
5th rowG

Common Values

ValueCountFrequency (%)
no drugs 52218
94.2%
G 1239
 
2.2%
under drug influence 864
 
1.6%
not applicable 677
 
1.2%
sleepy/fatigued 369
 
0.7%
impairment - physical 57
 
0.1%

Length

2023-11-24T10:58:15.053120image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T10:58:15.101742image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
no 52218
47.4%
drugs 52218
47.4%
g 1239
 
1.1%
under 864
 
0.8%
drug 864
 
0.8%
influence 864
 
0.8%
not 677
 
0.6%
applicable 677
 
0.6%
sleepy/fatigued 369
 
0.3%
impairment 57
 
0.1%
Other values (2) 114
 
0.1%

Most occurring characters

ValueCountFrequency (%)
n 55544
12.3%
u 55179
12.2%
54737
12.1%
d 54315
12.0%
r 54003
11.9%
g 53451
11.8%
o 52895
11.7%
s 52644
11.6%
e 4433
 
1.0%
l 2644
 
0.6%
Other values (13) 12628
 
2.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 396071
87.5%
Space Separator 54737
 
12.1%
Uppercase Letter 1239
 
0.3%
Other Punctuation 369
 
0.1%
Dash Punctuation 57
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 55544
14.0%
u 55179
13.9%
d 54315
13.7%
r 54003
13.6%
g 53451
13.5%
o 52895
13.4%
s 52644
13.3%
e 4433
 
1.1%
l 2644
 
0.7%
i 2081
 
0.5%
Other values (9) 8882
 
2.2%
Space Separator
ValueCountFrequency (%)
54737
100.0%
Uppercase Letter
ValueCountFrequency (%)
G 1239
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 369
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 57
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 397310
87.8%
Common 55163
 
12.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 55544
14.0%
u 55179
13.9%
d 54315
13.7%
r 54003
13.6%
g 53451
13.5%
o 52895
13.3%
s 52644
13.3%
e 4433
 
1.1%
l 2644
 
0.7%
i 2081
 
0.5%
Other values (10) 10121
 
2.5%
Common
ValueCountFrequency (%)
54737
99.2%
/ 369
 
0.7%
- 57
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 452473
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 55544
12.3%
u 55179
12.2%
54737
12.1%
d 54315
12.0%
r 54003
11.9%
g 53451
11.8%
o 52895
11.7%
s 52644
11.6%
e 4433
 
1.0%
l 2644
 
0.6%
Other values (13) 12628
 
2.8%

cellphone_in_use
Boolean

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size487.1 KiB
False
54396 
True
 
1028
ValueCountFrequency (%)
False 54396
98.1%
True 1028
 
1.9%
2023-11-24T10:58:15.144384image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

month
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.2244154
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size649.5 KiB
2023-11-24T10:58:15.183662image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q34
95-th percentile6
Maximum12
Range11
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.8118304
Coefficient of variation (CV)0.56190975
Kurtosis3.3055255
Mean3.2244154
Median Absolute Deviation (MAD)1
Skewness1.2313921
Sum178710
Variance3.2827296
MonotonicityNot monotonic
2023-11-24T10:58:15.229131image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
3 11334
20.4%
1 10744
19.4%
4 10626
19.2%
2 10239
18.5%
5 9456
17.1%
6 1289
 
2.3%
8 373
 
0.7%
7 332
 
0.6%
9 319
 
0.6%
10 253
 
0.5%
Other values (2) 459
 
0.8%
ValueCountFrequency (%)
1 10744
19.4%
2 10239
18.5%
3 11334
20.4%
4 10626
19.2%
5 9456
17.1%
6 1289
 
2.3%
7 332
 
0.6%
8 373
 
0.7%
9 319
 
0.6%
10 253
 
0.5%
ValueCountFrequency (%)
12 210
 
0.4%
11 249
 
0.4%
10 253
 
0.5%
9 319
 
0.6%
8 373
 
0.7%
7 332
 
0.6%
6 1289
 
2.3%
5 9456
17.1%
4 10626
19.2%
3 11334
20.4%

Interactions

2023-11-24T10:58:11.497775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:03.116485image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:04.240290image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:08.304726image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:09.386806image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:10.439952image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:11.566039image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:03.188172image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:04.726848image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:08.380218image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:09.459093image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:10.507822image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:12.603570image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:04.027705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:06.125434image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:09.187699image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:10.254001image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:11.309312image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:12.653694image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:04.080650image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:06.628991image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:09.240430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:10.305259image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:11.359938image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:12.698120image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:04.128051image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:07.110631image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:09.287853image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:10.347868image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:11.405232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:12.744259image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:04.174758image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:07.605409image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:09.337148image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:10.394045image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-24T10:58:11.451716image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-11-24T10:58:15.273328image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
vehicle_agecounty_city_locationdistancecollision_timeinsurance_premiummonthvehicle_typevehicle_transmissiondirectionintersectionweather_1location_typeprimary_collision_factorpcf_violation_categoryroad_surfaceroad_condition_1lightingat_faultparty_sobrietyparty_drug_physicalcellphone_in_use
vehicle_age1.000-0.0030.036-0.0180.5600.0210.1480.0590.0220.0450.0090.0490.0000.0510.0190.0000.0810.1340.0620.0320.016
county_city_location-0.0031.0000.0380.0010.0070.0120.1300.0990.1960.2350.1650.2570.0290.1060.2460.1160.1540.1370.0990.0810.357
distance0.0360.0381.000-0.0400.0050.0370.0430.0280.0880.1720.0300.1540.0000.0680.0560.0280.0930.0910.0160.0330.006
collision_time-0.0180.001-0.0401.000-0.0210.0010.0640.0680.0320.0630.0500.0470.0100.1470.0450.0230.4550.1470.2020.0690.010
insurance_premium0.5600.0070.005-0.0211.0000.0120.1090.0500.0240.0500.0120.0580.0130.0630.0180.0070.0760.1780.2310.2230.012
month0.0210.0120.0370.0010.0121.0000.0700.0440.0160.0290.0750.0280.0340.0430.1050.0180.0800.0240.0430.0930.013
vehicle_type0.1480.1300.0430.0640.1090.0701.0000.1180.0650.1240.0240.1080.0140.3710.0240.0170.0470.2730.1050.0760.010
vehicle_transmission0.0590.0990.0280.0680.0500.0440.1181.0000.0340.0470.0090.0380.0120.0720.0090.0000.0410.0780.0820.0490.009
direction0.0220.1960.0880.0320.0240.0160.0650.0341.0000.9670.0250.2610.0020.3090.0340.0330.0640.0720.0310.0190.017
intersection0.0450.2350.1720.0630.0500.0290.1240.0470.9671.0000.0380.4380.0060.6250.0500.0540.1260.0730.0660.0370.015
weather_10.0090.1650.0300.0500.0120.0750.0240.0090.0250.0381.0000.0410.0090.0360.5360.0510.0380.0520.0140.0140.021
location_type0.0490.2570.1540.0470.0580.0280.1080.0380.2610.4380.0411.0000.0250.2820.0510.0660.0980.0290.0570.0490.021
primary_collision_factor0.0000.0290.0000.0100.0130.0340.0140.0120.0020.0060.0090.0251.0001.0000.0170.0100.0080.0040.0160.0220.000
pcf_violation_category0.0510.1060.0680.1470.0630.0430.3710.0720.3090.6250.0360.2821.0001.0000.0680.0370.1810.2510.3270.1090.022
road_surface0.0190.2460.0560.0450.0180.1050.0240.0090.0340.0500.5360.0510.0170.0681.0000.1070.0350.0660.0090.0100.019
road_condition_10.0000.1160.0280.0230.0070.0180.0170.0000.0330.0540.0510.0660.0100.0370.1071.0000.0280.0250.0100.0100.007
lighting0.0810.1540.0930.4550.0760.0800.0470.0410.0640.1260.0380.0980.0080.1810.0350.0281.0000.0990.1690.0520.006
at_fault0.1340.1370.0910.1470.1780.0240.2730.0780.0720.0730.0520.0290.0040.2510.0660.0250.0991.0000.2970.1690.009
party_sobriety0.0620.0990.0160.2020.2310.0430.1050.0820.0310.0660.0140.0570.0160.3270.0090.0100.1690.2971.0000.6340.021
party_drug_physical0.0320.0810.0330.0690.2230.0930.0760.0490.0190.0370.0140.0490.0220.1090.0100.0100.0520.1690.6341.0000.009
cellphone_in_use0.0160.3570.0060.0100.0120.0130.0100.0090.0170.0150.0210.0210.0000.0220.0190.0070.0060.0090.0210.0091.000

Missing values

2023-11-24T10:58:12.862140image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-11-24T10:58:13.073219image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

vehicle_typevehicle_transmissionvehicle_agecounty_city_locationdistancedirectionintersectionweather_1location_typeprimary_collision_factorpcf_violation_categoryroad_surfaceroad_condition_1lightingcollision_timeat_faultinsurance_premiumparty_sobrietyparty_drug_physicalcellphone_in_usemonth
0sedanmanual3.0100020.0eastFalseclearintersectionvehicle code violationautomobile right of waydrynormaldaylight15False20.0had not been drinkingno drugsFalse1
1sedanmanual6.01964100.0southFalseclearhighwayvehicle code violationspeedingdryconstructiondark with street lights1False33.0had been drinking, under influenceno drugsFalse2
2sedanauto2.037110.0unknownTrueclearroadvehicle code violationautomobile right of waydrynormaldaylight14False64.0had not been drinkingno drugsFalse1
3sedanauto2.01920150.0eastFalseclearroadvehicle code violationautomobile right of waydrynormaldusk or dawn17False32.0had not been drinkingno drugsFalse1
4coupemanual2.03900500.0southFalseclearhighwayvehicle code violationimproper turningdrynormaldusk or dawn17True18.0impairment unknownGFalse1
5sedanauto15.033000.0unknownTrueclearroadvehicle code violationautomobile right of waydrynormaldaylight13True65.0had not been drinkingno drugsFalse1
6sedanauto8.033000.0unknownTrueclearroadvehicle code violationautomobile right of waydrynormaldaylight13False53.0had not been drinkingno drugsFalse1
7sedanmanual0.0070050.0westFalseclearroadvehicle code violationspeedingdrynormaldaylight15False40.0had not been drinkingno drugsFalse1
8sedanmanual8.03905200.0northFalseclearintersectionvehicle code violationduidrynormaldark with no street lights19False51.0had not been drinkingno drugsFalse1
9sedanauto0.036070.0unknownTruecloudyroadvehicle code violationspeedingdrynormaldaylight16False24.0had not been drinkingno drugsFalse2
vehicle_typevehicle_transmissionvehicle_agecounty_city_locationdistancedirectionintersectionweather_1location_typeprimary_collision_factorpcf_violation_categoryroad_surfaceroad_condition_1lightingcollision_timeat_faultinsurance_premiumparty_sobrietyparty_drug_physicalcellphone_in_usemonth
55606coupemanual6.007120.0unknownTrueclearroadvehicle code violationunsafe lane changedrynormaldaylight11False43.0had not been drinkingno drugsFalse5
55607coupemanual3.0197675.0southFalseraininghighwayvehicle code violationimproper turningwetnormaldark with street lights4True24.0had been drinking, impairment unknownno drugsFalse3
55608sedanauto3.05606178.0northFalsecloudyroadvehicle code violationimproper turningdrynormaldaylight7True63.0had not been drinkingno drugsFalse4
55609sedanmanual3.0190026.0southFalseclearroadvehicle code violationimproper turningdrynormaldaylight11True63.0had not been drinkingno drugsFalse3
55610sedanauto6.057010.0unknownTruerainingroadvehicle code violationduiwetnormaldark with street lights19False38.0had not been drinkingno drugsFalse1
55611sedanmanual2.0370038.0eastFalseclearroadvehicle code violationspeedingdrynormaldaylight7False53.0had not been drinkingno drugsFalse4
55612coupemanual3.01955132.0southFalseclearroadvehicle code violationimproper turningdrynormaldaylight12True18.0had been drinking, not under influenceno drugsFalse3
55613sedanmanual2.007917235.0southFalseclearhighwayvehicle code violationunsafe lane changedrynormaldaylight7True21.0had not been drinkingno drugsFalse5
55614sedanmanual3.01942300.0southFalseraininghighwayvehicle code violationspeedingwetnormaldaylight12True29.0had not been drinkingno drugsFalse3
55615sedanmanual3.0371331.0southFalseclearroadvehicle code violationfollowing too closelydrynormaldaylight16True42.0had not been drinkingno drugsFalse4

Duplicate rows

Most frequently occurring

vehicle_typevehicle_transmissionvehicle_agecounty_city_locationdistancedirectionintersectionweather_1location_typeprimary_collision_factorpcf_violation_categoryroad_surfaceroad_condition_1lightingcollision_timeat_faultinsurance_premiumparty_sobrietyparty_drug_physicalcellphone_in_usemonth# duplicates
19hatchbackauto2.037110.0unknownTrueclearroadvehicle code violationpedestrian right of waydrynormaldaylight15False17.0had not been drinkingno drugsFalse13
0coupeauto3.019420.0unknownTrueclearroadvehicle code violationautomobile right of waydrynormaldark with street lights20False21.0had not been drinkingno drugsFalse42
1coupeauto3.019420.0unknownTrueclearroadvehicle code violationtraffic signals and signsdrynormaldark with street lights21False23.0had not been drinkingno drugsFalse42
2coupeauto3.01953700.0westFalseclearhighwayvehicle code violationspeedingdrynormaldaylight14False24.0had not been drinkingno drugsFalse52
3coupeauto3.0371127.0southFalseclearroadvehicle code violationspeedingdrynormaldaylight16True23.0had not been drinkingno drugsFalse22
4coupeauto5.0301920.0eastFalsecloudyroadvehicle code violationduiwetnormaldark with street lights4False37.0had not been drinkingno drugsFalse12
5coupeauto5.042001056.0northFalseclearhighwayvehicle code violationspeedingdrynormaldark with no street lights4False33.0had not been drinkingno drugsFalse52
6coupeauto6.0560026.0eastFalseclearhighwayvehicle code violationspeedingdrynormaldark with no street lights17False42.0had not been drinkingno drugsFalse12
7coupeauto7.019200.0unknownTrueclearroadvehicle code violationtraffic signals and signsdrynormaldaylight9False48.0had not been drinkingno drugsFalse32
8coupeauto9.00105457.0northFalsecloudyhighwayvehicle code violationspeedingdrynormaldaylight15False60.0had not been drinkingno drugsFalse32